首页> 外文OA文献 >RapidRAID: Pipelined Erasure Codes for Fast Data Archival in Distributed Storage Systems
【2h】

RapidRAID: Pipelined Erasure Codes for Fast Data Archival in Distributed Storage Systems

机译:RapidRaID:用于分布式快速数据存档的流水线擦除代码   存储系统

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

To achieve reliability in distributed storage systems, data has usually beenreplicated across different nodes. However the increasing volume of data to bestored has motivated the introduction of erasure codes, a storage efficientalternative to replication, particularly suited for archival in data centers,where old datasets (rarely accessed) can be erasure encoded, while replicas aremaintained only for the latest data. Many recent works consider the design ofnew storage-centric erasure codes for improved repairability. In contrast, thispaper addresses the migration from replication to encoding: traditionallyerasure coding is an atomic operation in that a single node with the wholeobject encodes and uploads all the encoded pieces. Although large datasets canbe concurrently archived by distributing individual object encodings amongdifferent nodes, the network and computing capacity of individual nodesconstrain the archival process due to such atomicity. We propose a new pipelined coding strategy that distributes the network andcomputing load of single-object encodings among different nodes, which alsospeeds up multiple object archival. We further present RapidRAID codes, anexplicit family of pipelined erasure codes which provides fast archival withoutcompromising either data reliability or storage overheads. Finally, we providea real implementation of RapidRAID codes and benchmark its performance usingboth a cluster of 50 nodes and a set of Amazon EC2 instances. Experiments showthat RapidRAID codes reduce a single object's coding time by up to 90%, whilewhen multiple objects are encoded concurrently, the reduction is up to 20%.
机译:为了在分布式存储系统中实现可靠性,通常已在不同节点之间复制了数据。但是,要存储的数据量不断增加,促使人们引入擦除代码,这是复制的一种有效存储方式,特别适用于数据中心的归档,在该数据中心中,旧数据集(很少访问)可以被擦除编码,而副本仅用于最新数据。许多近期的工作都考虑了设计新的以存储为中心的擦除代码,以提高可修复性。相比之下,本文介绍了从复制到编码的迁移:传统上,擦除编码是一种原子操作,其中具有整个对象的单个节点编码并上传所有编码片段。尽管可以通过在不同节点之间分配单个对象编码来同时归档大型数据集,但是由于这种原子性,单个节点的网络和计算能力会限制归档过程。我们提出了一种新的流水线编码策略,该策略在不同节点之间分配网络并计算单对象编码的负载,这也加快了多对象归档的速度。我们进一步介绍了RapidRAID代码,这是流水线擦除代码的显式系列,可提供快速归档,而不会影响数据可靠性或存储开销。最后,我们提供了RapidRAID代码的真实实现并使用50个节点的集群和一组Amazon EC2实例对它的性能进行了基准测试。实验表明,RapidRAID代码最多可将单个对象的编码时间减少90%,而同时对多个对象进行编码时,最多可减少20%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号